A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# this will help in making the Python code more structured automatically (good coding practice)
%load_ext nb_black
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer,
)
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# code to load the csv file
INN = pd.read_csv("INNHotelsGroup.csv")
# copying data to another variable to avoid any changes to original data
data = INN.copy()
# code to see the first five rows of data
data.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
# code to see the last five rows of data
data.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
# code to understand the shape of the data
data.shape
(36275, 19)
# code to see data type and number of null values
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
Observation
# looking at value counts for non-numeric features
num_to_display = 10 # defining this up here so it's easy to change later if I want
for colname in data.dtypes[data.dtypes == "object"].index:
val_counts = data[colname].value_counts(dropna=False) # i want to see NA counts
print(val_counts[:num_to_display])
if len(val_counts) > num_to_display:
print(f"Only displaying first {num_to_display} of {len(val_counts)} values.")
print("\n\n") # just for more space between
INN10712 1 INN14258 1 INN00237 1 INN07840 1 INN07373 1 INN30692 1 INN10175 1 INN03773 1 INN28414 1 INN35318 1 Name: Booking_ID, dtype: int64 Only displaying first 10 of 36275 values. Meal Plan 1 27835 Not Selected 5130 Meal Plan 2 3305 Meal Plan 3 5 Name: type_of_meal_plan, dtype: int64 Room_Type 1 28130 Room_Type 4 6057 Room_Type 6 966 Room_Type 2 692 Room_Type 5 265 Room_Type 7 158 Room_Type 3 7 Name: room_type_reserved, dtype: int64 Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: market_segment_type, dtype: int64 Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64
Observations
Meal_Plan1 - 76%Room_Type1 - 78%Online - 64%Not_Cancelled- 67% For the remaining 33% of the bookings that are cancelled it will be useful to analyze the segments that have the most cancellations`
# looking at value counts for numeric features
num_to_display = 20 # defining this up here so it's easy to change later if I want
for colname in data.dtypes[data.dtypes == "int64"].index:
val_counts = data[colname].value_counts(dropna=False) # i want to see NA counts
print(val_counts[:num_to_display])
if len(val_counts) > num_to_display:
print(f"Only displaying first {num_to_display} of {len(val_counts)} values.")
print("\n\n") # just for more space between
2 26108 1 7695 3 2317 0 139 4 16 Name: no_of_adults, dtype: int64 0 33577 1 1618 2 1058 3 19 9 2 10 1 Name: no_of_children, dtype: int64 0 16872 1 9995 2 9071 3 153 4 129 5 34 6 20 7 1 Name: no_of_weekend_nights, dtype: int64 2 11444 1 9488 3 7839 4 2990 0 2387 5 1614 6 189 7 113 10 62 8 62 9 34 11 17 15 10 12 9 14 7 13 5 17 3 16 2 Name: no_of_week_nights, dtype: int64 0 35151 1 1124 Name: required_car_parking_space, dtype: int64 0 1297 1 1078 2 643 3 630 4 628 5 577 6 519 8 436 7 429 12 412 14 384 11 371 37 337 39 335 9 332 13 332 10 317 19 316 18 302 15 298 Name: lead_time, dtype: int64 Only displaying first 20 of 352 values. 2018 29761 2017 6514 Name: arrival_year, dtype: int64 10 5317 9 4611 8 3813 6 3203 12 3021 11 2980 7 2920 4 2736 5 2598 3 2358 2 1704 1 1014 Name: arrival_month, dtype: int64 13 1358 17 1345 2 1331 19 1327 4 1327 16 1306 20 1281 6 1273 15 1273 18 1260 14 1242 30 1216 12 1204 8 1198 29 1190 21 1158 5 1154 26 1146 25 1146 1 1133 Name: arrival_date, dtype: int64 Only displaying first 20 of 31 values. 0 35345 1 930 Name: repeated_guest, dtype: int64 0 35937 1 198 2 46 3 43 11 25 5 11 4 10 13 4 6 1 Name: no_of_previous_cancellations, dtype: int64 0 35463 1 228 2 112 3 80 4 65 5 60 6 36 7 24 8 23 10 19 9 19 11 15 12 12 14 9 15 8 13 7 16 7 20 6 17 6 18 6 Name: no_of_previous_bookings_not_canceled, dtype: int64 Only displaying first 20 of 59 values. 0 19777 1 11373 2 4364 3 675 4 78 5 8 Name: no_of_special_requests, dtype: int64
Overall Observations
Meal Plan 1(27835)- 76%, while 5130 did not select any plan at all.Room_Type_1 (28130)-78%online (23214)-64%Almost 33% of the books were cancelled (11885)
Most of the booking were from guests that had no children (33577)-93%
2 adults (26108)`-72%0 weekend nights - 47%2 weeknights appears to be the most popular booking overall.-32%October month has the most arrivals, followed by September930 guest were repeated customers, while 35345 were first time guests35937-97%0 was the largest group in this category (35463)-98%special request (19777)No_of_previous_cancellations and No_of_previous_bookings_not_canceled appear to carry similar information.
# code to generate the numeric data description with the rows and columns transposed
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 36275.0 | 1.844962 | 0.518715 | 0.0 | 2.0 | 2.00 | 2.0 | 4.0 |
| no_of_children | 36275.0 | 0.105279 | 0.402648 | 0.0 | 0.0 | 0.00 | 0.0 | 10.0 |
| no_of_weekend_nights | 36275.0 | 0.810724 | 0.870644 | 0.0 | 0.0 | 1.00 | 2.0 | 7.0 |
| no_of_week_nights | 36275.0 | 2.204300 | 1.410905 | 0.0 | 1.0 | 2.00 | 3.0 | 17.0 |
| required_car_parking_space | 36275.0 | 0.030986 | 0.173281 | 0.0 | 0.0 | 0.00 | 0.0 | 1.0 |
| lead_time | 36275.0 | 85.232557 | 85.930817 | 0.0 | 17.0 | 57.00 | 126.0 | 443.0 |
| arrival_year | 36275.0 | 2017.820427 | 0.383836 | 2017.0 | 2018.0 | 2018.00 | 2018.0 | 2018.0 |
| arrival_month | 36275.0 | 7.423653 | 3.069894 | 1.0 | 5.0 | 8.00 | 10.0 | 12.0 |
| arrival_date | 36275.0 | 15.596995 | 8.740447 | 1.0 | 8.0 | 16.00 | 23.0 | 31.0 |
| repeated_guest | 36275.0 | 0.025637 | 0.158053 | 0.0 | 0.0 | 0.00 | 0.0 | 1.0 |
| no_of_previous_cancellations | 36275.0 | 0.023349 | 0.368331 | 0.0 | 0.0 | 0.00 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 36275.0 | 0.153411 | 1.754171 | 0.0 | 0.0 | 0.00 | 0.0 | 58.0 |
| avg_price_per_room | 36275.0 | 103.423539 | 35.089424 | 0.0 | 80.3 | 99.45 | 120.0 | 540.0 |
| no_of_special_requests | 36275.0 | 0.619655 | 0.786236 | 0.0 | 0.0 | 0.00 | 1.0 | 5.0 |
Observation
No _of_adults: On average there were 2 adults per roomNo_of_children : There is a large difference between the minimum no of children and max. There may be outliers present in this variable.No_of_weekend_nights : On average occupants stayed one weekend night. There is a large difference between the 75th percentile and max. There may be outliers present in this variable.No_of_week_nights : On average occupant stayed two weeknights. There is a large difference between the 75th percentile and the max value. There may be outliers present in the dataset.Questions:
## code to create a histogram combined with a box plot
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
# Code to create a histogram box plot
histogram_boxplot(data, "no_of_adults", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "no_of_children", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "no_of_weekend_nights", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "no_of_week_nights", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "required_car_parking_space", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "lead_time", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "arrival_year", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "arrival_month", bins=100)
Observations
October appears to be the most popular month with just over 5000 arrivals January appears to be the least popular month for hotel bookings with just around 1000 arrivals# Code to create a histogram box plot
histogram_boxplot(data, "arrival_date", bins=100)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "repeated_guest", bins=50)
Observations
first time guests# Code to create a histogram box plot
histogram_boxplot(data, "no_of_previous_cancellations", bins=50)
Observations
# Code to create a histogram box plot
histogram_boxplot(data, "no_of_previous_bookings_not_canceled", bins=10)
# Code to create a histogram box plot
histogram_boxplot(data, "avg_price_per_room", bins=100)
# Code to create a histogram box plot
histogram_boxplot(data, "no_of_special_requests", bins=100)
Observations
# Converting some of the numeric data to categorical since they are discrete variables.
# Those listed as objects will also be converted to categorical to save on space.
# The conversion to categorical is being done here in preparation for the regression and decision treemodelling to come later on
data["type_of_meal_plan"].astype("category")
data["room_type_reserved"] = data["room_type_reserved"].astype("category")
data["market_segment_type"] = data["market_segment_type"].astype("category")
data["booking_status"] = data["booking_status"].astype("category")
data["arrival_year"] = data["arrival_year"].astype("category")
data["required_car_parking_space"] = data["required_car_parking_space"].astype("category")
data["no_of_adults"] = data["no_of_adults"].astype("category")
data["no_of_children"] = data["no_of_children"].astype("category")
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# Code to create a labeled bar plot
labeled_barplot(data, "no_of_adults", perc=True)
Observation
# Code to create a labeled bar plot
labeled_barplot(data, "no_of_children", perc=True)
Observations
# Code to create a labeled bar plot
labeled_barplot(data, "no_of_weekend_nights", perc=True)
Observations
1 weekend night stay2 weekend night stays# Code to create a labeled bar plot
labeled_barplot(data, "no_of_week_nights", perc=True)
Observations
2 week nights - 31.5% 1 week night stay -26.2% 3 week night stays# Code to create a labeled bar plot
labeled_barplot(data, "type_of_meal_plan", perc=True)
Observation
Meal Plan 1 Meal plan 3# Code to create a labeled bar plot
labeled_barplot(data, "required_car_parking_space", perc=True)
Observation
# Code to create a labeled bar plot
labeled_barplot(data, "room_type_reserved", perc=True)
Observation
Room_Type 1Room_Type 4# Code to create a labeled bar plot
labeled_barplot(data, "arrival_year", perc=True)
Observations
# Code to create a labeled bar plot
labeled_barplot(data, "arrival_month", perc=True)
Observations
October- 14.7% September is the next most poplular month for arrivals - 12.7% August was the third busiest month for arrivals - 10.5% # Code to create a labeled bar plot
labeled_barplot(data, "arrival_date", perc=True)
Observations
# Code to create a labeled bar plot
labeled_barplot(data, "market_segment_type", perc=True)
Observations
Online at 64% OfflineCorporateAviation Complimentary# Code to create a labeled bar plot
labeled_barplot(data, "repeated_guest", perc=True)
# code to see the counts of each values in the series
data["repeated_guest"].value_counts()
0 35345 1 930 Name: repeated_guest, dtype: int64
Observations
repeated_guests# Code to create a labeled bar plot
labeled_barplot(data, "no_of_previous_cancellations", perc=True)
Observations
# Code to create a labeled bar plot
labeled_barplot(data, "no_of_special_requests", perc=True)
Observations
31.4% of the bookings had 1 special request
It will be worthwhile investigating later on to see if there are any realtionships between special request and cancellations
# Code to create a labeled bar plot
labeled_barplot(data, "booking_status", perc=True)
data["booking_status"].value_counts()
Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64
Observations
# here we generate a heat map of the independent variables
plt.figure(figsize=(15, 7))
sns.heatmap(data.corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
Observations
no_of_previous bookings_not cancelled, no_of_previous_cancellations and repeated_guests# let's create a datframe consisting of the Avg_Room_price and market_segment columns
data2 = data[["avg_price_per_room", "market_segment_type"]].copy()
# This code groups the dataframe by the mean of average room price first
data3 = (
data2.groupby("market_segment_type")[["avg_price_per_room"]]
.mean()
.sort_values(by=["avg_price_per_room"], ascending=False, inplace=False)
)
# This code resets the index so that the variable (market segment type) is now an available column to use in further data manipulations such as plots
data4 = data3.reset_index()
data4
| market_segment_type | avg_price_per_room | |
|---|---|---|
| 0 | Online | 112.256855 |
| 1 | Aviation | 100.704000 |
| 2 | Offline | 91.632679 |
| 3 | Corporate | 82.911740 |
| 4 | Complementary | 3.141765 |
# Here we are plotting Mean of avg room price vs the Market Segment Type to have a visual of the avg prices across the market segments
plt.figure(figsize=(20, 7))
graph = sns.barplot(y=data4["avg_price_per_room"], x=data4["market_segment_type"])
graph.set_title("Mean of avg room price vs Market Segment Type")
plt.xticks(rotation=70)
for p in graph.patches:
graph.annotate(
"%.0f" % p.get_height(),
(p.get_x() + p.get_width() / 2.0, p.get_height()),
ha="center",
va="center",
fontsize=16,
color="black",
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
## Here we find the number of repeated guest that have canceled by applying a filter on the original data set
data9 = data[data["booking_status"] == "Canceled"]
data10 = data9[data9["repeated_guest"] == 1]
data11 = data[data["repeated_guest"] == 1]
percentage = round((len(data10["repeated_guest"]) / len(data11["repeated_guest"])) * 100, 2)
print("The percentage of repeated guest that canceled is :", percentage,"%")
The percentage of repeated guest that canceled is : 1.72 %
Observations
# Now we will convert no of repeated guest to categorical since these values are discrete 0 or 1 before outlier treatment
data["repeated_guest"] = data["repeated_guest"].astype("category")
# Code to create a labeled bar plot
labeled_barplot(data9, "no_of_special_requests", perc=True)
Observations
1 special requests attached to them. 2 special requests attached to them. # Code to create a labeled bar plot
labeled_barplot(data9, "market_segment_type", perc=True)
# data9 was filtered to show only cancelled bookings
plt.figure(figsize=(15, 12))
graph = sns.boxplot(
y="arrival_month",
x="no_of_special_requests",
hue="market_segment_type",
data=data9)
graph.set_title("Cancellations by Market Segment and Special Request vs Arrival Month")
plt.xticks(rotation=70)
for p in graph.patches:
graph.annotate(
"%.0f" % p.get_height(),
(p.get_x() + p.get_width() / 2.0, p.get_height()),
ha="center",
va="center",
fontsize=16,
color="black",
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
Observations
0, 1 or 2 special requests.# data9 was filtered to show only cancelled bookings
plt.figure(figsize=(15, 7))
graph = sns.boxplot(y="lead_time", x="market_segment_type", hue=None, data=data9)
graph.set_title("Cancellations by Lead_time and Market_Segment")
plt.xticks(rotation=70)
for p in graph.patches:
graph.annotate(
"%.0f" % p.get_height(),
(p.get_x() + p.get_width() / 2.0, p.get_height()),
ha="center",
va="center",
fontsize=16,
color="black",
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
Observations
# let's look for missing values
data.isnull().sum()
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
Observation
# looking at which values are duplicated
duplicate = data[data.duplicated()]
print("Duplicated Rows:")
duplicate.sum()
Duplicated Rows:
Booking_ID 0.0 no_of_weekend_nights 0.0 no_of_week_nights 0.0 type_of_meal_plan 0.0 lead_time 0.0 arrival_month 0.0 arrival_date 0.0 no_of_previous_cancellations 0.0 no_of_previous_bookings_not_canceled 0.0 avg_price_per_room 0.0 no_of_special_requests 0.0 dtype: float64
Observation
numerical_col = data.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numerical_col):
plt.subplot(5, 4, i + 1)
plt.boxplot(data[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
Observation
upper and lower`outliers.lower outliersupper outliers# functions to treat outliers by flooring and capping
def treat_outliers(df, col):
"""
Treats outliers in a variable
df: dataframe
col: dataframe column
"""
Q1 = df[col].quantile(0.25) # 25th quantile
Q3 = df[col].quantile(0.75) # 75th quantile
IQR = Q3 - Q1
Lower_Whisker = Q1 - 1.5 * IQR
Upper_Whisker = Q3 + 1.5 * IQR
# all the values smaller than Lower_Whisker will be assigned the value of Lower_Whisker
# all the values greater than Upper_Whisker will be assigned the value of Upper_Whisker
df[col] = np.clip(df[col], Lower_Whisker, Upper_Whisker)
return df
def treat_outliers_all(df, col_list):
"""
Treat outliers in a list of variables
df: dataframe
col_list: list of dataframe columns
"""
for c in col_list:
df = treat_outliers(df, c)
return df
numerical_col = data.select_dtypes(include=np.number).columns.tolist()
data = treat_outliers_all(data, numerical_col)
# let's look at box plot to see if outliers have been treated or not
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numerical_col):
plt.subplot(5, 4, i + 1)
plt.boxplot(data[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
# code to create a histogram box plot
histogram_boxplot(data, "avg_price_per_room", bins=20)
Observations
# code to create a histogram box plot
histogram_boxplot(data, "lead_time", bins=100)
Observation
# code to create a box plot
plt.figure(figsize=(15, 12))
sns.boxplot(
y="arrival_month", x="no_of_special_requests", hue="booking_status", data=data,
)
plt.show()
Observations
0, 1 or 2 special requests appears to have a lot of cancellations with 0 special request having the most. 0 special requests, cancellations occur from May to September1 special request, cancellations occur from June to October 2 special request, cancellations occur from August to October There are no cancellations with bookings having 3 special request
There is an obvious trend, the more special request there are , the less cancellations occur.
## Code to encode values in the booking_status column. Not_Canceled will be encoded as 0 while Canceled will be encoded as 1
data["booking_status"] = data["booking_status"].apply(
lambda x: 1 if x == "Canceled" else 0
)
# verification check to ensure that "booking_status" is encoded properly,
data["booking_status"]
0 0
1 0
2 1
3 1
4 1
..
36270 0
36271 1
36272 0
36273 1
36274 0
Name: booking_status, Length: 36275, dtype: category
Categories (2, int64): [1, 0]
# Code to fix some spaces in the type_of_meal_plan column
data["type_of_meal_plan"] = [col.replace(" ", "_") for col in data["type_of_meal_plan"]]
data.columns = [col.replace("/", "_") for col in data.columns]
data.columns = [col.replace("-", "_") for col in data.columns]
data.drop(["Booking_ID"], axis=1)
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal_Plan_1 | 0 | Room_Type 1 | 224.0 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0.0 | 0 |
| 1 | 2 | 0 | 2 | 3 | Not_Selected | 0 | Room_Type 1 | 5.0 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1.0 | 0 |
| 2 | 1 | 0 | 2 | 1 | Meal_Plan_1 | 0 | Room_Type 1 | 1.0 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0.0 | 1 |
| 3 | 2 | 0 | 0 | 2 | Meal_Plan_1 | 0 | Room_Type 1 | 211.0 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0.0 | 1 |
| 4 | 2 | 0 | 1 | 1 | Not_Selected | 0 | Room_Type 1 | 48.0 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0.0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36270 | 3 | 0 | 2 | 6 | Meal_Plan_1 | 0 | Room_Type 4 | 85.0 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1.0 | 0 |
| 36271 | 2 | 0 | 1 | 3 | Meal_Plan_1 | 0 | Room_Type 1 | 228.0 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2.0 | 1 |
| 36272 | 2 | 0 | 2 | 6 | Meal_Plan_1 | 0 | Room_Type 1 | 148.0 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2.0 | 0 |
| 36273 | 2 | 0 | 0 | 3 | Not_Selected | 0 | Room_Type 1 | 63.0 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0.0 | 1 |
| 36274 | 2 | 0 | 1 | 2 | Meal_Plan_1 | 0 | Room_Type 1 | 207.0 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0.0 | 0 |
36275 rows × 18 columns
X = data.drop(["booking_status", "Booking_ID"], axis=1)
Y = data["booking_status"]
X = pd.get_dummies(X, drop_first=True)
# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)
print("Shape of Training set : ", X_train.shape)
print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Shape of Training set : (25392, 34) Shape of test set : (10883, 34) Percentage of classes in training set: 0 0.670644 1 0.329356 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.676376 1 0.323624 Name: booking_status, dtype: float64
# Here we look at the first 5 rows of data
X_train.head()
| no_of_weekend_nights | no_of_week_nights | lead_time | arrival_month | arrival_date | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | no_of_adults_1 | no_of_adults_2 | no_of_adults_3 | no_of_adults_4 | no_of_children_1 | no_of_children_2 | no_of_children_3 | no_of_children_9 | no_of_children_10 | type_of_meal_plan_Meal_Plan_2 | type_of_meal_plan_Meal_Plan_3 | type_of_meal_plan_Not_Selected | required_car_parking_space_1 | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | arrival_year_2018 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | repeated_guest_1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13662 | 0 | 1 | 163.0 | 10 | 15 | 0 | 0 | 115.00 | 0.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 26641 | 0 | 3 | 113.0 | 3 | 31 | 0 | 0 | 78.15 | 1.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 17835 | 2 | 3 | 289.5 | 10 | 14 | 0 | 0 | 78.00 | 1.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 21485 | 0 | 3 | 136.0 | 6 | 29 | 0 | 0 | 85.50 | 0.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 5670 | 1 | 2 | 21.0 | 8 | 15 | 0 | 0 | 151.00 | 0.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
Canceled but in reality the booking will be a Not_Canceled.Not_Canceled but in reality the booking will be a CanceledBoth the cases are important as:
If we predict a booking status is at risk of being Canceled
- we will be deploying human resources to convince these people to stay via follow up calls, emails and
- offering incentives to people that don't need convincing e.g. discount coupons
If we predict a booking status will be Not_Canceled but in reality it will be Canceled there will be loss of revenue via
- Loss of revenue when the hotel cannot resell the room
- Being forced to reduced the room prices at the last minute to get them booked
- Additional cost of distribution channels by increasing the commissions or paying for publicity to help sell these rooms.
- Human resources to help sell these rooms
Recall should be maximized, the greater the Recall score higher the chances of reducing false negatives.# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is faster for high-dimensional data
lg = LogisticRegression(solver="newton-cg", random_state=1)
model = lg.fit(X_train, y_train)
# predicting on training set
y_pred_train = lg.predict(X_train)
print("Training set performance:")
print("Accuracy:", accuracy_score(y_train, y_pred_train))
print("Precision:", precision_score(y_train, y_pred_train))
print("Recall:", recall_score(y_train, y_pred_train))
print("F1:", f1_score(y_train, y_pred_train))
Training set performance: Accuracy: 0.8043478260869565 Precision: 0.7359933268455443 Recall: 0.6330264259237116 F1: 0.6806376960658267
# predicting on the test set
y_pred_test = lg.predict(X_test)
print("Test set performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_test))
print("Precision:", precision_score(y_test, y_pred_test))
print("Recall:", recall_score(y_test, y_pred_test))
print("F1:", f1_score(y_test, y_pred_test))
Test set performance: Accuracy: 0.8051088854176238 Precision: 0.7295968534906588 Recall: 0.6320272572402045 F1: 0.6773162939297125
Observations
The training and testing f1_scores are 0.67.
f1_score on the train and test sets are comparable.
This shows that the model is showing generalised results.
We have build a logistic regression model which shows good performance on the train and test sets but to identify significant variables we will have to build a logistic regression model using the statsmodels library.
We will now perform logistic regression using statsmodels, a Python module that provides functions for the estimation of many statistical models, as well as for conducting statistical tests, and statistical data exploration.
Using statsmodels, we will be able to check the statistical validity of our model - identify the significant predictors from p-values that we get for each predictor variable.
Variance Inflation factor: Variance inflation factors measure the inflation in the variances of the regression coefficients estimates due to collinearities that exist among the predictors. It is a measure of how much the variance of the estimated regression coefficient βk is “inflated”by the existence of correlation among the predictor variables in the model.
General Rule of thumb: If VIF is 1 then there is no correlation among the kth predictor and the remaining predictor variables, and hence the variance of β̂k is not inflated at all. Whereas if VIF exceeds 5, we say there is moderate VIF and if it is 10 or exceeding 10, it shows signs of high multi-collinearity. But the purpose of the analysis should dictate which threshold to use.
# code to check multicollinearity using variance inflation factor calculations
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: no_of_weekend_nights 1.981449 no_of_week_nights 4.187612 lead_time 2.852895 arrival_month 8.542290 arrival_date 4.196274 no_of_previous_cancellations NaN no_of_previous_bookings_not_canceled NaN avg_price_per_room 23.362218 no_of_special_requests 2.093220 no_of_adults_1 39.000854 no_of_adults_2 134.662999 no_of_adults_3 14.144420 no_of_adults_4 1.157834 no_of_children_1 1.128298 no_of_children_2 3.332683 no_of_children_3 1.040794 no_of_children_9 1.002322 no_of_children_10 1.000652 type_of_meal_plan_Meal_Plan_2 1.399317 type_of_meal_plan_Meal_Plan_3 1.026860 type_of_meal_plan_Not_Selected 1.497166 required_car_parking_space_1 1.074415 room_type_reserved_Room_Type 2 1.241947 room_type_reserved_Room_Type 3 1.003559 room_type_reserved_Room_Type 4 1.746760 room_type_reserved_Room_Type 5 1.036397 room_type_reserved_Room_Type 6 3.020687 room_type_reserved_Room_Type 7 1.208170 arrival_year_2018 7.772960 market_segment_type_Complementary 2.975427 market_segment_type_Corporate 10.792540 market_segment_type_Offline 52.074457 market_segment_type_Online 116.262937 repeated_guest_1 1.366866 dtype: float64
# Here we are dropping "no_of_previous_cancellations" and renaming series
X_train1 = X_train.drop("no_of_previous_cancellations", axis=1)
vif_series2 = pd.Series(
[variance_inflation_factor(X_train1.values, i) for i in range(X_train1.shape[1])],
index=X_train1.columns,
)
print("Series before feature selection: \n\n{}\n".format(vif_series2))
Series before feature selection: no_of_weekend_nights 1.981449 no_of_week_nights 4.187612 lead_time 2.852895 arrival_month 8.542290 arrival_date 4.196274 no_of_previous_bookings_not_canceled NaN avg_price_per_room 23.362218 no_of_special_requests 2.093220 no_of_adults_1 39.000854 no_of_adults_2 134.662999 no_of_adults_3 14.144420 no_of_adults_4 1.157834 no_of_children_1 1.128298 no_of_children_2 3.332683 no_of_children_3 1.040794 no_of_children_9 1.002322 no_of_children_10 1.000652 type_of_meal_plan_Meal_Plan_2 1.399317 type_of_meal_plan_Meal_Plan_3 1.026860 type_of_meal_plan_Not_Selected 1.497166 required_car_parking_space_1 1.074415 room_type_reserved_Room_Type 2 1.241947 room_type_reserved_Room_Type 3 1.003559 room_type_reserved_Room_Type 4 1.746760 room_type_reserved_Room_Type 5 1.036397 room_type_reserved_Room_Type 6 3.020687 room_type_reserved_Room_Type 7 1.208170 arrival_year_2018 7.772960 market_segment_type_Complementary 2.975427 market_segment_type_Corporate 10.792540 market_segment_type_Offline 52.074457 market_segment_type_Online 116.262937 repeated_guest_1 1.366866 dtype: float64
# Here we are dropping "no_of_previous_bookings_ not_cancelled" and renaming series
X_train2 = X_train1.drop("no_of_previous_bookings_not_canceled", axis=1)
vif_series3 = pd.Series(
[variance_inflation_factor(X_train2.values, i) for i in range(X_train2.shape[1])],
index=X_train2.columns,
)
print("Series before feature selection: \n\n{}\n".format(vif_series3))
Series before feature selection: no_of_weekend_nights 1.981449 no_of_week_nights 4.187612 lead_time 2.852895 arrival_month 8.542290 arrival_date 4.196274 avg_price_per_room 23.362218 no_of_special_requests 2.093220 no_of_adults_1 39.000854 no_of_adults_2 134.662999 no_of_adults_3 14.144420 no_of_adults_4 1.157834 no_of_children_1 1.128298 no_of_children_2 3.332683 no_of_children_3 1.040794 no_of_children_9 1.002322 no_of_children_10 1.000652 type_of_meal_plan_Meal_Plan_2 1.399317 type_of_meal_plan_Meal_Plan_3 1.026860 type_of_meal_plan_Not_Selected 1.497166 required_car_parking_space_1 1.074415 room_type_reserved_Room_Type 2 1.241947 room_type_reserved_Room_Type 3 1.003559 room_type_reserved_Room_Type 4 1.746760 room_type_reserved_Room_Type 5 1.036397 room_type_reserved_Room_Type 6 3.020687 room_type_reserved_Room_Type 7 1.208170 arrival_year_2018 7.772960 market_segment_type_Complementary 2.975427 market_segment_type_Corporate 10.792540 market_segment_type_Offline 52.074457 market_segment_type_Online 116.262937 repeated_guest_1 1.366866 dtype: float64
Observation
no_of_previous_cancellations and no_of_previous_bookings_not_cancelled removed most of the multicollinearity from the variables.# for the data set we will drop "Booking_ID" since this is not an important independent variable.
# we will also drop "no_of_previous_bookings_not_canceled" and "no_of_previous_cancellations" since they are highly correlated.
X = data.drop(
[
"booking_status",
"Booking_ID",
"no_of_previous_bookings_not_canceled",
"no_of_previous_cancellations",
],
axis=1,
)
Y = data["booking_status"]
X = pd.get_dummies(X, drop_first=True)
# adding constant
X = sm.add_constant(X)
# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)
# fitting logistic regression model
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(disp=False)
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25359
Method: MLE Df Model: 32
Date: Fri, 19 Nov 2021 Pseudo R-squ.: 0.3292
Time: 14:19:17 Log-Likelihood: -10795.
converged: False LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
=====================================================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------------------------
const -2.6675 0.432 -6.176 0.000 -3.514 -1.821
no_of_weekend_nights 0.1103 0.020 5.607 0.000 0.072 0.149
no_of_week_nights 0.0095 0.014 0.700 0.484 -0.017 0.036
lead_time 0.0164 0.000 59.742 0.000 0.016 0.017
arrival_month -0.0414 0.007 -6.361 0.000 -0.054 -0.029
arrival_date 0.0007 0.002 0.381 0.703 -0.003 0.005
avg_price_per_room 0.0206 0.001 25.539 0.000 0.019 0.022
no_of_special_requests -1.4906 0.030 -49.240 0.000 -1.550 -1.431
no_of_adults_1 -0.2828 0.339 -0.834 0.404 -0.947 0.382
no_of_adults_2 -0.0311 0.336 -0.092 0.926 -0.689 0.627
no_of_adults_3 -0.2247 0.349 -0.645 0.519 -0.908 0.458
no_of_adults_4 -0.5724 1.043 -0.549 0.583 -2.616 1.471
no_of_children_1 0.0520 0.087 0.599 0.549 -0.118 0.222
no_of_children_2 0.3120 0.193 1.621 0.105 -0.065 0.689
no_of_children_3 -0.0275 0.825 -0.033 0.973 -1.644 1.589
no_of_children_9 2.8864 1.488 1.940 0.052 -0.030 5.803
no_of_children_10 -13.2723 2224.721 -0.006 0.995 -4373.646 4347.101
type_of_meal_plan_Meal_Plan_2 0.2162 0.066 3.287 0.001 0.087 0.345
type_of_meal_plan_Meal_Plan_3 11.3902 171.869 0.066 0.947 -325.468 348.248
type_of_meal_plan_Not_Selected 0.2744 0.054 5.115 0.000 0.169 0.380
required_car_parking_space_1 -1.6180 0.138 -11.692 0.000 -1.889 -1.347
room_type_reserved_Room_Type 2 -0.4489 0.144 -3.113 0.002 -0.732 -0.166
room_type_reserved_Room_Type 3 -0.1014 1.322 -0.077 0.939 -2.693 2.490
room_type_reserved_Room_Type 4 -0.2474 0.055 -4.498 0.000 -0.355 -0.140
room_type_reserved_Room_Type 5 -0.7036 0.209 -3.370 0.001 -1.113 -0.294
room_type_reserved_Room_Type 6 -0.7488 0.193 -3.874 0.000 -1.128 -0.370
room_type_reserved_Room_Type 7 -0.6128 0.302 -2.027 0.043 -1.205 -0.020
arrival_year_2018 0.4915 0.060 8.184 0.000 0.374 0.609
market_segment_type_Complementary -31.5555 6.98e+04 -0.000 1.000 -1.37e+05 1.37e+05
market_segment_type_Corporate -1.2678 0.263 -4.820 0.000 -1.783 -0.752
market_segment_type_Offline -2.3049 0.252 -9.148 0.000 -2.799 -1.811
market_segment_type_Online -0.5288 0.249 -2.126 0.034 -1.016 -0.041
repeated_guest_1 -1.9228 0.378 -5.092 0.000 -2.663 -1.183
=====================================================================================================
print("Training performance:")
model_performance_classification_statsmodels(lg, X_train, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80486 | 0.6347 | 0.736404 | 0.68178 |
'no_of_week_nights' and 'arrival_date'no_of_adults' all the attributes have a high p-value which means it is not significant therefore we can drop the complete variable.For other attributes present in the data, the p-values are high only for few dummy variables and since only one (or some) of the categorical levels have a high p-value we will drop them iteratively as sometimes p-values change after dropping a variable. So, we'll not drop all variables at once.
Instead, we will do the following repeatedly using a loop:
Note: The above process can also be done manually by picking one variable at a time that has a high p-value, dropping it, and building a model again. But that might be a little tedious and using a loop will be more efficient.
# running a loop to drop variables with high p-value
# initial list of columns
cols = X_train.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
X_train_aux = X_train[cols]
# fitting the model
model = sm.Logit(y_train, X_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['const', 'no_of_weekend_nights', 'lead_time', 'arrival_month', 'avg_price_per_room', 'no_of_special_requests', 'no_of_adults_1', 'no_of_adults_3', 'type_of_meal_plan_Meal_Plan_2', 'type_of_meal_plan_Not_Selected', 'required_car_parking_space_1', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'arrival_year_2018', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'repeated_guest_1']
# we create a new variable X_train1 to represent the independent varibles that were treated for multicollinearity
X_train1 = X_train[selected_features]
# fitting logistic regression model
logit2 = sm.Logit(y_train, X_train1.astype(float))
lg2 = logit2.fit(disp=False)
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25372
Method: MLE Df Model: 19
Date: Fri, 19 Nov 2021 Pseudo R-squ.: 0.3279
Time: 14:19:20 Log-Likelihood: -10815.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -3.2435 0.106 -30.487 0.000 -3.452 -3.035
no_of_weekend_nights 0.1158 0.020 5.930 0.000 0.078 0.154
lead_time 0.0164 0.000 60.667 0.000 0.016 0.017
arrival_month -0.0421 0.006 -6.511 0.000 -0.055 -0.029
avg_price_per_room 0.0211 0.001 26.981 0.000 0.020 0.023
no_of_special_requests -1.4898 0.030 -49.417 0.000 -1.549 -1.431
no_of_adults_1 -0.2487 0.047 -5.323 0.000 -0.340 -0.157
no_of_adults_3 -0.2208 0.077 -2.886 0.004 -0.371 -0.071
type_of_meal_plan_Meal_Plan_2 0.2021 0.066 3.084 0.002 0.074 0.331
type_of_meal_plan_Not_Selected 0.2739 0.053 5.168 0.000 0.170 0.378
required_car_parking_space_1 -1.6208 0.138 -11.726 0.000 -1.892 -1.350
room_type_reserved_Room_Type 2 -0.3438 0.127 -2.706 0.007 -0.593 -0.095
room_type_reserved_Room_Type 4 -0.2434 0.054 -4.489 0.000 -0.350 -0.137
room_type_reserved_Room_Type 5 -0.7012 0.207 -3.381 0.001 -1.108 -0.295
room_type_reserved_Room_Type 6 -0.5196 0.114 -4.572 0.000 -0.742 -0.297
room_type_reserved_Room_Type 7 -0.5778 0.271 -2.133 0.033 -1.109 -0.047
arrival_year_2018 0.4899 0.060 8.195 0.000 0.373 0.607
market_segment_type_Corporate -0.7349 0.104 -7.064 0.000 -0.939 -0.531
market_segment_type_Offline -1.7725 0.052 -34.003 0.000 -1.875 -1.670
repeated_guest_1 -1.9181 0.377 -5.090 0.000 -2.657 -1.179
==================================================================================================
Now no feature has p-value greater than 0.05, so we'll consider the features in X_train2 as the final ones and lg2 as final model.
Coefficient of some attributes such as (lead time, type of meal plan_not selected, having 1 or 2 children) are positive and an increase in these will lead to increase in chances of a booking being "Canceled"
Coefficient of some attributes such as (arrival month, no of special request, room types 5,6 or 7, meal plan 2) are negative and an increase in these will lead to decrease in chances of a booking being "Canceled".
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train1.columns).T
| const | no_of_weekend_nights | lead_time | arrival_month | avg_price_per_room | no_of_special_requests | no_of_adults_1 | no_of_adults_3 | type_of_meal_plan_Meal_Plan_2 | type_of_meal_plan_Not_Selected | required_car_parking_space_1 | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | arrival_year_2018 | market_segment_type_Corporate | market_segment_type_Offline | repeated_guest_1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 0.039026 | 1.122776 | 1.01658 | 0.958742 | 1.021285 | 0.225419 | 0.779833 | 0.801880 | 1.224011 | 1.315042 | 0.197738 | 0.709096 | 0.783951 | 0.495987 | 0.594758 | 0.561158 | 1.632146 | 0.479531 | 0.169909 | 0.146884 |
| Change_odd% | -96.097382 | 12.277600 | 1.65795 | -4.125811 | 2.128512 | -77.458088 | -22.016746 | -19.811969 | 22.401118 | 31.504152 | -80.226241 | -29.090384 | -21.604947 | -50.401307 | -40.524219 | -43.884197 | 63.214636 | -52.046885 | -83.009087 | -85.311589 |
Factors increasing the chance of a booking being cancelled
arrival_year_2018: This feature increases the chance of a Cancellationthemost, in fact the odds are 63% or 1.63 times. However sinceyearis aone offoccurence, we will ignore it. Plus the model wil be used to predict the risk offuturebookings beingCancelled`. type_of_meal_plan_Not_Selected : Ignoring arrival_year, This feature increases the chance of a Cancellation the most. A 1 unit increase in type_of_meal_plan_Not_Selected will increase the odds of having a Cancellation by a factor of 1.32 or a 31.5% chance of a booking being Cancelledno_of_weekend_nights: Holding all other features constant a 1 unit increase in no_of_weekend_nights will increase the odds of a booking being Cancelled by 1.12 times or a 12.3% increase in the odds of having a Cancellation.type_of_meal_plan_2 : Holding all other features constant a 1 unit increase in type_of_Meal_Plan_2 will increase the odds of a booking being Cancelled by 1.22 or a 22.4% increase in the odds of having a Cancellation.lead_time : Holding all other features constant a 1 unit increase in lead_time will increase the odds of having a Cancellation by 1 or a 1.66% increase in the odds of having a Cancellation. Factors decreasing the chance of a booking being cancelled
repeated_guest: This feature decreases the chance of a Cancellation the most. A 1 unit increase in repeated_guests will decrease the odds of having a Cancellation by a factor of 0.15 or a 85.5% chance of a booking being Cancelledmarket_segment_type_Offline: This feature will decrease the odds a booking being Cancelled by a factor of 0.17 or 83%.market_segment_type_Corporate : This feature will decrease the odds of a booking beingCancelledby a factor of 0.47 or 52% decrease in the odds of having a a bookingCanceled`no_of_special_requests :Holding all other features constant, a 1 unit increase in no_of_special_requests will decrease the odds of having a Cancellation by a factor of 0.23 or a 77% chance of a booking not being cancelled. required_car_parking_space: Holding all other features constant a 1 unit increase in this feature will decrease the odds of a booking being Canceled by 0.20 or a 80.2% decrease in the odds of having a Cancellation.Similar interpretations can be done for the other attributes.
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train1, y_train)
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg2, X_train1, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.804112 | 0.633505 | 0.735119 | 0.680539 |
logit_roc_auc_train = roc_auc_score(y_train, lg2.predict(X_train1))
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train1))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.37293538490213085
# creating confusion matrix
confusion_matrix_statsmodels(
lg2, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_train1, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.791076 | 0.729882 | 0.667104 | 0.697082 |
y_scores = lg2.predict(X_train1) # calculate y_scores based on X_train1 series
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.41
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train1, y_train, threshold=optimal_threshold_curve)
# here we calculate the training scores based on the statsmodel and X_train1 series using the optimal threshold as defined by precision-recall
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_train1, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.796865 | 0.700825 | 0.688153 | 0.694431 |
Using Optimal Threshold set using Precision Recall curve
Model using the Optimal Threshold set by the ROC-AUC will be the logistic regression model of choice.
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.41 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.37 Threshold | Logistic Regression-0.41 Threshold | |
|---|---|---|---|
| Accuracy | 0.804112 | 0.791076 | 0.796865 |
| Recall | 0.633505 | 0.729882 | 0.700825 |
| Precision | 0.735119 | 0.667104 | 0.688153 |
| F1 | 0.680539 | 0.697082 | 0.694431 |
Dropping the columns from the test set that were dropped from the training set
# Dropping the columns from the test set that were dropped from the training set
X_test1 = X_test[list(X_train1.columns)]
Using model with default threshold
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test1, y_test)
## here we calculate the test scores based on the statsmodel and X_train1 series using the default threshold
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg2, X_test1, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.804558 | 0.633163 | 0.727569 | 0.677091 |
# Here we are plotting the ROC curve
logit_roc_auc_train = roc_auc_score(y_test, lg2.predict(X_test1))
fpr, tpr, thresholds = roc_curve(y_test, lg2.predict(X_test1))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
Using model with threshold=0.37
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test1, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_test1, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.796472 | 0.737649 | 0.668038 | 0.70112 |
Using model with threshold=0.41
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test1, y_test, threshold=optimal_threshold_curve)
# here we calculate the test scores based on the statsmodel and X_test1 series using the optimal threshold based on precision-recall
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_test1, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.798585 | 0.702726 | 0.683702 | 0.693083 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression Default Threshold",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.41 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression Default Threshold | Logistic Regression-0.37 Threshold | Logistic Regression-0.41 Threshold | |
|---|---|---|---|
| Accuracy | 0.804112 | 0.791076 | 0.796865 |
| Recall | 0.633505 | 0.729882 | 0.700825 |
| Precision | 0.735119 | 0.667104 | 0.688153 |
| F1 | 0.680539 | 0.697082 | 0.694431 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression Default Threshold",
"Logistic Regression-0.37 Threshold",
"Logistic Regression-0.41 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression Default Threshold | Logistic Regression-0.37 Threshold | Logistic Regression-0.41 Threshold | |
|---|---|---|---|
| Accuracy | 0.804558 | 0.796472 | 0.798585 |
| Recall | 0.633163 | 0.737649 | 0.702726 |
| Precision | 0.727569 | 0.668038 | 0.683702 |
| F1 | 0.677091 | 0.701120 | 0.693083 |
We have been able to build a predictive model that can be used by the hotel management to determine bookings that are likely to be cancelled.
The logistic regression model with threshold based on ROC curve has the highest Recall of 0.74
All the logistic regression models have given a generalized performance on the training and test set.
Coefficient of [no_of_weekend_nights, no_of_week_nights, lead_time, type_of_meal_plan_Not_Selected] are positive an increase in these will lead to increase in chances of a booking being Cancelled.
Coefficient of [type_of_meal_plan2, no_of_special_requests, required_car_parking_space] are negative increase in these will lead to decrease in chances of a booking being Cancelled.
# We will use the data with Multicollinearity since decision trees are not affected by this.
X = data.drop(["booking_status", "Booking_ID"], axis=1)
Y = data["booking_status"]
X = pd.get_dummies(X, drop_first=True)
# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)
print(X_train.shape, X_test.shape)
(25392, 34) (10883, 34)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0 0.670644 1 0.329356 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.676376 1 0.323624 Name: booking_status, dtype: float64
Canceled but in reality the booking will be a Not_Canceled.Not_Canceled but in reality the booking will be a CanceledBoth the cases are important as:
If we predict a booking status is at risk of being Canceled
- we will be deploying human resources to convince these people to stay via follow up calls, emails and
- offering incentives to people that don't need convincing e.g. discount coupons
If we predict a booking status will be Not_Canceled but in reality it will be Canceled there will be loss of revenue via
- Loss of revenue when the hotel cannot resell the room
- Being forced to reduced the room prices at the last minute to get them booked
- Additional cost of distribution channels by increasing the commissions or paying for publicity to help sell these rooms.
- Human resources to help sell these rooms
Recall should be maximized, the greater the Recall score higher the chances of reducing false negatives.## Function to calculate recall score
def get_recall_score(model, predictors, target):
"""
model: classifier
predictors: independent variables
target: dependent variable
"""
prediction = model.predict(predictors)
return recall_score(target, prediction)
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
If the frequency of class A is 10% and the frequency of class B is 90%, then class B will become the dominant class and the decision tree will become biased toward the dominant classes.
In this case, we can pass a dictionary {0:0.33,1:0.67} to the model to specify the weight of each class and the decision tree will give more weightage to class 1.
class_weight is a hyperparameter for the decision tree classifier.
# Here we set the parameters for building the default tree
model = DecisionTreeClassifier(
criterion="gini", class_weight={0: 0.33, 1: 0.67}, random_state=1
)
# fittig the training data to the tree
model.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.33, 1: 0.67}, random_state=1)
# creating the confusion matrix
confusion_matrix_sklearn(model, X_train, y_train)
# calculating the recall score on training data for the original tree
decision_tree_perf_train = get_recall_score(model, X_train, y_train)
print("Recall Score:", decision_tree_perf_train)
Recall Score: 0.9950974530670812
# creating the confuion matrix
confusion_matrix_sklearn(model, X_test, y_test)
decision_tree_perf_test = get_recall_score(model, X_test, y_test)
print("Recall Score:", decision_tree_perf_test)
Recall Score: 0.8038046564452016
0.8.## creating a list of column names
feature_names = X_train.columns.to_list()
# plotting the decision tree
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 179.47 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- avg_price_per_room <= 68.50 | | | | | | | | | |--- repeated_guest_1 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 48.50 | | | | | | | | | | | |--- weights: [20.79, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 48.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- repeated_guest_1 > 0.50 | | | | | | | | | | |--- weights: [35.97, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 68.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- avg_price_per_room <= 115.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 115.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- repeated_guest_1 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- repeated_guest_1 > 0.50 | | | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [9.57, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 5.36] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [16.83, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- weights: [513.15, 0.00] class: 0 | | | | | |--- avg_price_per_room > 179.47 | | | | | | |--- arrival_date <= 25.50 | | | | | | | |--- lead_time <= 18.00 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- lead_time > 18.00 | | | | | | | | |--- weights: [0.00, 10.72] class: 1 | | | | | | |--- arrival_date > 25.50 | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 68.50 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- avg_price_per_room <= 63.29 | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | |--- weights: [18.48, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | |--- lead_time <= 10.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 10.00 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | |--- avg_price_per_room <= 59.75 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 59.75 | | | | | | | | | | |--- lead_time <= 44.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 44.00 | | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 63.29 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [8.91, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- no_of_adults_1 <= 0.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- no_of_adults_1 > 0.50 | | | | | | | | | | |--- weights: [0.00, 6.70] class: 1 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- lead_time <= 65.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- weights: [45.87, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- lead_time > 65.50 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- no_of_adults_1 <= 0.50 | | | | | | | | | | | |--- weights: [1.65, 1.34] class: 0 | | | | | | | | | | |--- no_of_adults_1 > 0.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 46.35 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 46.35 | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | |--- lead_time > 68.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | |--- weights: [6.93, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | |--- lead_time <= 88.50 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- weights: [20.13, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 88.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | |--- lead_time <= 73.50 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | | | |--- lead_time > 73.50 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- weights: [3.96, 0.00] class: 0 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- avg_price_per_room <= 132.43 | | | | | | | | | |--- lead_time <= 81.00 | | | | | | | | | | |--- avg_price_per_room <= 123.25 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 123.25 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- lead_time > 81.00 | | | | | | | | | | |--- avg_price_per_room <= 122.22 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 122.22 | | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | | |--- avg_price_per_room > 132.43 | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- repeated_guest_1 <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- repeated_guest_1 > 0.50 | | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | |--- weights: [13.86, 0.00] class: 0 | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | | |--- avg_price_per_room <= 80.25 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 80.25 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | | |--- weights: [14.19, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 7.37] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- lead_time <= 104.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 104.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 86.68 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 86.68 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- lead_time <= 107.50 | | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | | |--- weights: [3.63, 4.02] class: 1 | | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- lead_time > 107.50 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 116.78 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 116.78 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- weights: [0.00, 5.36] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 125.00 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- lead_time <= 97.00 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 97.00 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 125.00 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | |--- arrival_date > 11.50 | | | | | | | |--- avg_price_per_room <= 102.09 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- lead_time <= 114.50 | | | | | | | | | | |--- avg_price_per_room <= 95.44 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 95.44 | | | | | | | | | | | |--- weights: [0.00, 50.25] class: 1 | | | | | | | | | |--- lead_time > 114.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 102.09 | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | | |--- weights: [0.00, 7.37] class: 1 | | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [14.85, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | |--- avg_price_per_room <= 124.25 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- weights: [0.00, 31.49] class: 1 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 124.25 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | |--- no_of_adults_1 <= 0.50 | | | | | | | | |--- lead_time <= 146.00 | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | |--- weights: [1.32, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | |--- lead_time > 146.00 | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | |--- weights: [0.33, 0.67] class: 1 | | | | | | | |--- no_of_adults_1 > 0.50 | | | | | | | | |--- weights: [46.20, 0.00] class: 0 | | | | | |--- no_of_adults_2 > 0.50 | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- avg_price_per_room <= 103.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 103.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [18.48, 0.00] class: 0 | | | | | | | |--- arrival_date > 7.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- avg_price_per_room <= 67.38 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 67.38 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | | | | |--- weights: [1.32, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | | | | |--- weights: [4.29, 0.67] class: 0 | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- lead_time <= 144.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- weights: [10.23, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 144.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 65.25 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 65.25 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | |--- arrival_date > 20.50 | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [1.98, 0.67] class: 0 | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | |--- weights: [23.76, 0.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 99.44 | | | | | |--- arrival_month <= 1.50 | | | | | | |--- weights: [40.92, 0.00] class: 0 | | | | | |--- arrival_month > 1.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 70.05 | | | | | | | | | |--- weights: [13.86, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 70.05 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | |--- weights: [0.00, 8.71] class: 1 | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 74.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 74.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | |--- lead_time <= 11.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 11.00 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | |--- avg_price_per_room <= 94.66 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [52.80, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- avg_price_per_room <= 90.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 90.21 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 94.66 | | | | | | | | | |--- avg_price_per_room <= 95.10 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | | | |--- avg_price_per_room > 95.10 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- weights: [5.28, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | | |--- weights: [0.00, 4.69] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | |--- avg_price_per_room > 99.44 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | |--- required_car_parking_space_1 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- required_car_parking_space_1 > 0.50 | | | | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | |--- weights: [4.62, 0.00] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | |--- weights: [12.54, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 2.68] class: 1 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | |--- no_of_adults_3 <= 0.50 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- no_of_adults_3 > 0.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | |--- arrival_date > 24.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | |--- avg_price_per_room <= 118.50 | | | | | | | | | |--- lead_time <= 12.50 | | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- lead_time > 12.50 | | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 118.50 | | | | | | | | | |--- no_of_children_1 <= 0.50 | | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | | | | | | |--- no_of_children_1 > 0.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | |--- no_of_children_2 <= 0.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- no_of_children_2 > 0.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- no_of_adults_3 <= 0.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- no_of_adults_3 > 0.50 | | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 143.83 | | | | | | | | | |--- weights: [9.57, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 143.83 | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [6.93, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- required_car_parking_space_1 <= 0.50 | | | | | |--- avg_price_per_room <= 71.92 | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | |--- lead_time <= 84.50 | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | |--- lead_time <= 51.50 | | | | | | | | | | |--- avg_price_per_room <= 29.04 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 29.04 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 51.50 | | | | | | | | | | |--- weights: [5.61, 0.00] class: 0 | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | |--- weights: [10.23, 0.00] class: 0 | | | | | | | |--- lead_time > 84.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | | |--- lead_time <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 131.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- weights: [4.62, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | |--- no_of_week_nights <= 5.00 | | | | | | | | | | | |--- weights: [3.63, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 5.00 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | |--- avg_price_per_room <= 62.38 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 62.38 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- avg_price_per_room <= 71.34 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 71.34 | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | |--- avg_price_per_room > 71.92 | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | |--- lead_time <= 65.50 | | | | | | | | |--- avg_price_per_room <= 120.45 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- no_of_adults_3 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_adults_3 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- avg_price_per_room > 120.45 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- lead_time > 65.50 | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 75.75 | | | | | | | | | | | |--- weights: [0.00, 4.69] class: 1 | | | | | | | | | | |--- avg_price_per_room > 75.75 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | | | |--- weights: [0.00, 28.14] class: 1 | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | |--- avg_price_per_room <= 104.31 | | | | | | | | |--- lead_time <= 25.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [7.26, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [10.23, 0.00] class: 0 | | | | | | | | |--- lead_time > 25.50 | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | |--- no_of_adults_1 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- no_of_adults_1 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- avg_price_per_room > 104.31 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 144.76 | | | | | | | | | | | |--- truncated branch of depth 24 | | | | | | | | | | |--- avg_price_per_room > 144.76 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | | |--- lead_time <= 22.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 22.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | |--- required_car_parking_space_1 > 0.50 | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | |--- weights: [21.45, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | |--- weights: [0.00, 0.67] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 102.50 | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [279.84, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [8.91, 0.00] class: 0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children_1 <= 0.50 | | | | | | | | | | |--- weights: [14.19, 0.00] class: 0 | | | | | | | | | |--- no_of_children_1 > 0.50 | | | | | | | | | | |--- lead_time <= 95.50 | | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | | | | |--- lead_time > 95.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- avg_price_per_room <= 138.55 | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | |--- weights: [1.32, 0.00] class: 0 | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 138.55 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | |--- lead_time <= 63.00 | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | |--- weights: [5.94, 0.00] class: 0 | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- weights: [0.66, 0.67] class: 1 | | | | | | |--- lead_time > 63.00 | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | |--- lead_time > 102.50 | | | | | |--- no_of_week_nights <= 2.50 | | | | | | |--- lead_time <= 105.00 | | | | | | | |--- avg_price_per_room <= 67.65 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 67.65 | | | | | | | | |--- weights: [0.00, 2.68] class: 1 | | | | | | |--- lead_time > 105.00 | | | | | | | |--- avg_price_per_room <= 83.39 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | | | | |--- avg_price_per_room > 83.39 | | | | | | | | |--- avg_price_per_room <= 141.25 | | | | | | | | | |--- lead_time <= 143.50 | | | | | | | | | | |--- arrival_date <= 25.00 | | | | | | | | | | | |--- weights: [7.59, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 25.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 143.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 141.25 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | |--- no_of_week_nights > 2.50 | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | |--- weights: [18.81, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | |--- weights: [26.73, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- weights: [22.77, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- lead_time <= 0.50 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- lead_time > 0.50 | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_date <= 13.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 88.39 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- weights: [5.28, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.39 | | | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- avg_price_per_room <= 157.12 | | | | | | | | | |--- required_car_parking_space_1 <= 0.50 | | | | | | | | | | |--- weights: [13.86, 0.00] class: 0 | | | | | | | | | |--- required_car_parking_space_1 > 0.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 157.12 | | | | | | | | | |--- avg_price_per_room <= 172.65 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- avg_price_per_room > 172.65 | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | |--- arrival_date > 13.50 | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 139.57 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [32.01, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 139.57 | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | |--- avg_price_per_room <= 140.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 140.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 126.33 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- weights: [7.92, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 126.33 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space_1 <= 0.50 | | | | | | |--- avg_price_per_room <= 118.55 | | | | | | | |--- lead_time <= 61.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [31.02, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | | | | |--- weights: [56.10, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- lead_time > 61.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- no_of_children_1 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_children_1 > 0.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- lead_time <= 66.50 | | | | | | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 66.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 71.93 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 71.93 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | |--- avg_price_per_room > 118.55 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | |--- avg_price_per_room <= 118.98 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 118.98 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | |--- arrival_date <= 7.00 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 7.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 121.20 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 121.20 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- lead_time <= 55.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 55.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [16.50, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 119.20 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 119.20 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- weights: [22.11, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- required_car_parking_space_1 > 0.50 | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | |--- weights: [59.40, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- no_of_week_nights <= 0.50 | | | | | | |--- weights: [60.06, 0.00] class: 0 | | | | | |--- no_of_week_nights > 0.50 | | | | | | |--- weights: [641.52, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- lead_time <= 6.50 | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | |--- weights: [13.86, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | |--- lead_time > 6.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [5.61, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- avg_price_per_room <= 93.06 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 93.06 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- lead_time <= 20.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 20.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | | | | |--- weights: [10.89, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | |--- lead_time <= 19.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 19.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- no_of_adults_3 <= 0.50 | | | | | | | |--- weights: [19.47, 0.00] class: 0 | | | | | | |--- no_of_adults_3 > 0.50 | | | | | | | |--- weights: [3.63, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.25 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- lead_time <= 98.00 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 98.00 | | | | | | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- avg_price_per_room <= 157.50 | | | | | | | | | |--- no_of_children_1 <= 0.50 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_children_1 > 0.50 | | | | | | | | | | |--- lead_time <= 107.50 | | | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 107.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- avg_price_per_room > 157.50 | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 21.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- avg_price_per_room <= 103.50 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 103.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | | | |--- no_of_adults_1 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_adults_1 > 0.50 | | | | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | |--- avg_price_per_room <= 164.25 | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 164.25 | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | |--- arrival_date > 22.50 | | | | | | | | |--- lead_time <= 106.50 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- lead_time > 106.50 | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | |--- no_of_special_requests > 2.25 | | | | | |--- weights: [29.70, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults_1 <= 0.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 88.44] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- avg_price_per_room <= 76.87 | | | | | | | | | |--- avg_price_per_room <= 69.03 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 69.03 | | | | | | | | | | |--- weights: [0.00, 4.69] class: 1 | | | | | | | | |--- avg_price_per_room > 76.87 | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- weights: [0.00, 30.15] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- lead_time <= 244.00 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [7.92, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [0.00, 5.36] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 28.57 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | | |--- avg_price_per_room > 28.57 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- lead_time > 244.00 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- weights: [11.22, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- weights: [18.81, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults_3 <= 0.50 | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | | | |--- weights: [0.00, 225.12] class: 1 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 4.69] class: 1 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- weights: [0.00, 9.38] class: 1 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | |--- weights: [0.00, 7.37] class: 1 | | | | | | | |--- arrival_date > 8.50 | | | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | | | |--- no_of_adults_3 > 0.50 | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | |--- no_of_adults_1 > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | |--- weights: [1.32, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | |--- lead_time <= 162.50 | | | | | | | | |--- weights: [0.33, 0.67] class: 1 | | | | | | | |--- lead_time > 162.50 | | | | | | | | |--- weights: [0.00, 10.05] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [20.13, 4.02] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- avg_price_per_room <= 70.85 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 70.85 | | | | | | | | | | |--- weights: [0.00, 6.03] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- lead_time <= 278.00 | | | | | | | | | | |--- avg_price_per_room <= 57.29 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | | |--- avg_price_per_room > 57.29 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 278.00 | | | | | | | | | | |--- avg_price_per_room <= 83.50 | | | | | | | | | | | |--- weights: [8.91, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 83.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | |--- lead_time <= 283.25 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- lead_time > 283.25 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 4.69] class: 1 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [0.33, 0.67] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 35.22 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [3.63, 0.00] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- arrival_date <= 11.00 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- arrival_date > 11.00 | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | |--- avg_price_per_room > 35.22 | | | | | | |--- arrival_date <= 29.50 | | | | | | | |--- weights: [0.00, 32.83] class: 1 | | | | | | |--- arrival_date > 29.50 | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | | | | |--- arrival_month > 11.00 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- arrival_date <= 4.00 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- arrival_date > 4.00 | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_date <= 12.00 | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | |--- arrival_date > 12.00 | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | |--- weights: [0.00, 2.68] class: 1 | | | | | |--- lead_time > 159.50 | | | | | | |--- arrival_date <= 1.50 | | | | | | | |--- lead_time <= 176.50 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- lead_time > 176.50 | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | |--- arrival_date > 1.50 | | | | | | | |--- no_of_children_2 <= 0.50 | | | | | | | | |--- weights: [15.51, 0.00] class: 0 | | | | | | | |--- no_of_children_2 > 0.50 | | | | | | | | |--- lead_time <= 172.00 | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | |--- lead_time > 172.00 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | |--- lead_time <= 279.75 | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | | | | |--- lead_time > 279.75 | | | | | | | | | |--- weights: [0.66, 0.67] class: 1 | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [0.00, 83.75] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 272.00 | | | | | | | | | | |--- lead_time <= 226.50 | | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | | | | |--- lead_time > 226.50 | | | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | | | |--- lead_time > 272.00 | | | | | | | | | | |--- avg_price_per_room <= 73.10 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 73.10 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- weights: [3.96, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Offline <= 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | | |--- lead_time <= 245.50 | | | | | | | | | |--- weights: [15.18, 0.00] class: 0 | | | | | | | | |--- lead_time > 245.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | |--- lead_time <= 233.00 | | | | | | | | | | |--- lead_time <= 152.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 152.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- lead_time > 233.00 | | | | | | | | | | |--- no_of_children_2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_children_2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | |--- avg_price_per_room <= 81.81 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 81.81 | | | | | | | | | | |--- lead_time <= 208.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 208.50 | | | | | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 234.00 | | | | | | | | | | |--- lead_time <= 175.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 175.50 | | | | | | | | | | | |--- weights: [0.00, 6.70] class: 1 | | | | | | | | | |--- lead_time > 234.00 | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 269.00 | | | | | | | | | | |--- lead_time <= 176.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 176.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 269.00 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- arrival_date <= 14.50 | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- arrival_date > 3.00 | | | | | | | | |--- avg_price_per_room <= 64.43 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- avg_price_per_room > 64.43 | | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | |--- arrival_date > 14.50 | | | | | | | |--- avg_price_per_room <= 55.92 | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 55.92 | | | | | | | | |--- no_of_special_requests <= 2.25 | | | | | | | | | |--- avg_price_per_room <= 80.19 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.19 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | |--- no_of_special_requests > 2.25 | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | |--- market_segment_type_Offline > 0.50 | | | | | |--- no_of_week_nights <= 5.50 | | | | | | |--- lead_time <= 284.25 | | | | | | | |--- arrival_date <= 30.00 | | | | | | | | |--- weights: [39.27, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.00 | | | | | | | | |--- lead_time <= 168.00 | | | | | | | | | |--- weights: [0.66, 0.67] class: 1 | | | | | | | | |--- lead_time > 168.00 | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | |--- lead_time > 284.25 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- weights: [4.29, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- avg_price_per_room <= 84.00 | | | | | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | | | | | |--- weights: [1.98, 1.34] class: 0 | | | | | | | | | |--- avg_price_per_room > 84.00 | | | | | | | | | | |--- weights: [0.33, 0.67] class: 1 | | | | | |--- no_of_week_nights > 5.50 | | | | | | |--- lead_time <= 167.00 | | | | | | | |--- weights: [0.33, 0.00] class: 0 | | | | | | |--- lead_time > 167.00 | | | | | | | |--- weights: [0.00, 0.67] class: 1 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.25 | | | | |--- no_of_adults_2 <= 0.50 | | | | | |--- weights: [0.00, 326.29] class: 1 | | | | |--- no_of_adults_2 > 0.50 | | | | | |--- weights: [0.00, 1086.07] class: 1 | | | |--- no_of_special_requests > 2.25 | | | | |--- weights: [10.23, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [15.51, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- lead_time <= 172.50 | | | | | | |--- arrival_date <= 28.00 | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | |--- arrival_date > 28.00 | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | |--- lead_time > 172.50 | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | |--- weights: [0.00, 6.03] class: 1 | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [0.00, 2.68] class: 1
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.351017 avg_price_per_room 0.141231 market_segment_type_Online 0.094474 no_of_special_requests 0.086353 arrival_date 0.084464 arrival_month 0.064144 no_of_week_nights 0.048612 no_of_weekend_nights 0.035611 no_of_adults_1 0.018386 arrival_year_2018 0.015615 market_segment_type_Offline 0.009879 no_of_adults_2 0.008857 type_of_meal_plan_Not_Selected 0.008368 required_car_parking_space_1 0.007468 room_type_reserved_Room_Type 4 0.005837 no_of_children_1 0.003426 room_type_reserved_Room_Type 2 0.003219 type_of_meal_plan_Meal_Plan_2 0.002965 no_of_adults_3 0.001968 no_of_children_2 0.001756 repeated_guest_1 0.001584 room_type_reserved_Room_Type 6 0.001529 market_segment_type_Corporate 0.001475 room_type_reserved_Room_Type 5 0.001032 room_type_reserved_Room_Type 7 0.000362 no_of_adults_4 0.000204 no_of_children_3 0.000161 type_of_meal_plan_Meal_Plan_3 0.000000 no_of_children_9 0.000000 room_type_reserved_Room_Type 3 0.000000 market_segment_type_Complementary 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 no_of_children_10 0.000000
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
CanceledThe decsion tree created is very complex so we will reduce overfitting via two methods Hyperparameter tuning and Cost Complexity Pruning
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight={0: 0.33, 1: 0.67})
# Grid of parameters to choose from
parameters = {
"max_depth": [5, 10, 15],
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
"min_impurity_decrease": [0.00001, 0.0001, 0.01],
}
# Type of scoring used to compare parameter combinations
scorer = make_scorer(recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.33, 1: 0.67}, max_depth=15,
min_impurity_decrease=0.0001, random_state=1)
# creating the confusion matrix
confusion_matrix_sklearn(estimator, X_train, y_train)
decision_tree_tune_perf_train = get_recall_score(estimator, X_train, y_train)
print("Recall Score:", decision_tree_tune_perf_train)
Recall Score: 0.9048188449121128
# creating the confusion matrix
confusion_matrix_sklearn(estimator, X_test, y_test)
decision_tree_tune_perf_test = get_recall_score(estimator, X_test, y_test)
print("Recall Score:", decision_tree_tune_perf_test)
Recall Score: 0.8657013060760931
# plotting the decision tree
plt.figure(figsize=(15, 10))
out = tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- avg_price_per_room <= 179.47 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- avg_price_per_room <= 68.50 | | | | | | | | | |--- weights: [91.74, 4.69] class: 0 | | | | | | | | |--- avg_price_per_room > 68.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- weights: [0.99, 3.35] class: 1 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- repeated_guest_1 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- repeated_guest_1 > 0.50 | | | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [9.57, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 5.36] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- weights: [529.98, 0.00] class: 0 | | | | | |--- avg_price_per_room > 179.47 | | | | | | |--- weights: [1.32, 11.39] class: 1 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- lead_time <= 68.50 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- avg_price_per_room <= 63.29 | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | |--- weights: [18.48, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | |--- weights: [0.33, 1.34] class: 1 | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | |--- avg_price_per_room <= 59.75 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- weights: [0.66, 5.36] class: 1 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- weights: [6.60, 0.67] class: 0 | | | | | | | | | |--- avg_price_per_room > 59.75 | | | | | | | | | | |--- lead_time <= 44.00 | | | | | | | | | | | |--- weights: [0.33, 26.13] class: 1 | | | | | | | | | | |--- lead_time > 44.00 | | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 63.29 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [136.62, 16.75] class: 0 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [8.91, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- weights: [0.33, 6.70] class: 1 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- weights: [182.82, 12.06] class: 0 | | | | | |--- lead_time > 68.50 | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 62.50 | | | | | | | | | |--- weights: [6.93, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 62.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- weights: [2.64, 11.39] class: 1 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- weights: [33.99, 5.36] class: 0 | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- weights: [3.96, 0.00] class: 0 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- avg_price_per_room <= 132.43 | | | | | | | | | |--- weights: [4.29, 54.27] class: 1 | | | | | | | | |--- avg_price_per_room > 132.43 | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- weights: [0.99, 52.26] class: 1 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | |--- weights: [13.86, 0.00] class: 0 | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | | | |--- weights: [6.60, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | | | |--- weights: [3.63, 2.68] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [2.64, 4.02] class: 1 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- weights: [26.40, 1.34] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 7.37] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 86.68 | | | | | | | | | | | |--- weights: [0.99, 8.04] class: 1 | | | | | | | | | | |--- avg_price_per_room > 86.68 | | | | | | | | | | | |--- weights: [4.62, 1.34] class: 0 | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- weights: [19.47, 1.34] class: 0 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- lead_time <= 107.50 | | | | | | | | | |--- weights: [4.29, 4.02] class: 0 | | | | | | | | |--- lead_time > 107.50 | | | | | | | | | |--- weights: [2.97, 13.40] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- weights: [8.91, 2.68] class: 0 | | | | | | |--- arrival_date > 11.50 | | | | | | | |--- avg_price_per_room <= 102.09 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- weights: [1.65, 63.65] class: 1 | | | | | | | |--- avg_price_per_room > 102.09 | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [0.33, 7.37] class: 1 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [14.85, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | |--- avg_price_per_room <= 124.25 | | | | | | | | | | |--- weights: [1.32, 33.50] class: 1 | | | | | | | | | |--- avg_price_per_room > 124.25 | | | | | | | | | | |--- weights: [1.65, 1.34] class: 0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | |--- weights: [0.33, 1.34] class: 1 | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | |--- weights: [49.17, 0.67] class: 0 | | | | | |--- no_of_adults_2 > 0.50 | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | |--- weights: [21.12, 0.67] class: 0 | | | | | | | |--- arrival_date > 7.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | | | | |--- weights: [11.55, 8.04] class: 0 | | | | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- weights: [7.26, 35.51] class: 1 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- weights: [5.28, 4.02] class: 0 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- weights: [5.61, 0.67] class: 0 | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- lead_time <= 144.50 | | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | | |--- weights: [10.23, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 144.50 | | | | | | | | | | |--- weights: [1.65, 2.68] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- weights: [0.33, 2.01] class: 1 | | | | | | | |--- arrival_date > 20.50 | | | | | | | | |--- weights: [26.07, 0.67] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 99.44 | | | | | |--- arrival_month <= 1.50 | | | | | | |--- weights: [40.92, 0.00] class: 0 | | | | | |--- arrival_month > 1.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 70.05 | | | | | | | | | |--- weights: [13.86, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 70.05 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | | | |--- weights: [17.16, 0.67] class: 0 | | | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | | |--- weights: [0.00, 8.71] class: 1 | | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 74.21 | | | | | | | | | | | |--- weights: [0.33, 1.34] class: 1 | | | | | | | | | | |--- avg_price_per_room > 74.21 | | | | | | | | | | | |--- weights: [4.29, 0.00] class: 0 | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | |--- weights: [1.98, 4.69] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | |--- weights: [68.64, 2.68] class: 0 | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 4.69] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | |--- avg_price_per_room > 99.44 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 178.78 | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | |--- weights: [21.12, 13.40] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | |--- weights: [4.62, 0.00] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- weights: [64.35, 10.05] class: 0 | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | |--- weights: [0.00, 2.68] class: 1 | | | | | | |--- avg_price_per_room > 178.78 | | | | | | | |--- weights: [7.26, 11.39] class: 1 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [27.06, 102.51] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- weights: [11.55, 0.67] class: 0 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- weights: [3.30, 16.08] class: 1 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [6.93, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- required_car_parking_space_1 <= 0.50 | | | | | |--- avg_price_per_room <= 71.92 | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | |--- lead_time <= 84.50 | | | | | | | | |--- weights: [22.44, 3.35] class: 0 | | | | | | | |--- lead_time > 84.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | | |--- lead_time <= 131.50 | | | | | | | | | | | |--- weights: [0.33, 6.70] class: 1 | | | | | | | | | | |--- lead_time > 131.50 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- weights: [4.62, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- weights: [9.24, 2.68] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- avg_price_per_room <= 71.34 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- weights: [12.21, 42.88] class: 1 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- weights: [5.61, 1.34] class: 0 | | | | | | | | |--- avg_price_per_room > 71.34 | | | | | | | | | |--- weights: [4.95, 0.00] class: 0 | | | | | |--- avg_price_per_room > 71.92 | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | |--- lead_time <= 65.50 | | | | | | | | |--- avg_price_per_room <= 120.45 | | | | | | | | | |--- weights: [35.31, 4.02] class: 0 | | | | | | | | |--- avg_price_per_room > 120.45 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | | |--- weights: [0.66, 5.36] class: 1 | | | | | | | |--- lead_time > 65.50 | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 <= 0.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- weights: [7.26, 20.77] class: 1 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal_Plan_2 > 0.50 | | | | | | | | | |--- weights: [0.00, 28.14] class: 1 | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | |--- avg_price_per_room <= 104.31 | | | | | | | | |--- lead_time <= 25.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [7.26, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- weights: [17.16, 52.26] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [10.23, 0.00] class: 0 | | | | | | | | |--- lead_time > 25.50 | | | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [17.49, 81.74] class: 1 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | | | |--- weights: [32.67, 181.57] class: 1 | | | | | | | |--- avg_price_per_room > 104.31 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- weights: [144.54, 957.43] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- weights: [0.33, 4.02] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | | |--- lead_time <= 22.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 22.00 | | | | | | | | | | | |--- weights: [7.59, 36.85] class: 1 | | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | | |--- weights: [5.61, 2.68] class: 0 | | | | |--- required_car_parking_space_1 > 0.50 | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | |--- weights: [21.45, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | |--- weights: [0.00, 0.67] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 102.50 | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- weights: [308.55, 3.35] class: 0 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | |--- lead_time <= 63.00 | | | | | | | |--- weights: [6.93, 0.67] class: 0 | | | | | | |--- lead_time > 63.00 | | | | | | | |--- weights: [0.00, 3.35] class: 1 | | | | |--- lead_time > 102.50 | | | | | |--- no_of_week_nights <= 2.50 | | | | | | |--- lead_time <= 105.00 | | | | | | | |--- weights: [0.33, 2.68] class: 1 | | | | | | |--- lead_time > 105.00 | | | | | | | |--- avg_price_per_room <= 83.39 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- weights: [1.98, 0.00] class: 0 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | |--- weights: [0.66, 4.02] class: 1 | | | | | | | |--- avg_price_per_room > 83.39 | | | | | | | | |--- weights: [10.23, 2.01] class: 0 | | | | | |--- no_of_week_nights > 2.50 | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | |--- weights: [18.81, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | |--- weights: [0.99, 1.34] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- weights: [220.11, 18.09] class: 0 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- weights: [0.33, 1.34] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_date <= 13.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 88.39 | | | | | | | | | | |--- weights: [7.59, 0.67] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.39 | | | | | | | | | | |--- weights: [18.15, 13.40] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- weights: [0.33, 2.01] class: 1 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- weights: [14.85, 0.67] class: 0 | | | | | | |--- arrival_date > 13.50 | | | | | | | |--- type_of_meal_plan_Not_Selected <= 0.50 | | | | | | | | |--- weights: [54.78, 4.02] class: 0 | | | | | | | |--- type_of_meal_plan_Not_Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 126.33 | | | | | | | | | |--- weights: [14.52, 1.34] class: 0 | | | | | | | | |--- avg_price_per_room > 126.33 | | | | | | | | | |--- weights: [4.29, 6.03] class: 1 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space_1 <= 0.50 | | | | | | |--- avg_price_per_room <= 118.55 | | | | | | | |--- lead_time <= 61.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [31.02, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | | | | |--- weights: [56.10, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | |--- lead_time > 61.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- no_of_children_1 <= 0.50 | | | | | | | | | | | |--- weights: [1.32, 25.46] class: 1 | | | | | | | | | | |--- no_of_children_1 > 0.50 | | | | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- lead_time <= 66.50 | | | | | | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 66.50 | | | | | | | | | | | |--- weights: [16.50, 24.12] class: 1 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 71.93 | | | | | | | | | | | |--- weights: [24.09, 1.34] class: 0 | | | | | | | | | | |--- avg_price_per_room > 71.93 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- avg_price_per_room > 118.55 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | |--- avg_price_per_room <= 118.98 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 118.98 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | |--- arrival_date <= 7.00 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 7.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 121.20 | | | | | | | | | | | |--- weights: [8.25, 2.68] class: 0 | | | | | | | | | | |--- avg_price_per_room > 121.20 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- lead_time <= 55.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 55.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- weights: [5.28, 4.69] class: 0 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [16.50, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [108.24, 146.73] class: 1 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- weights: [22.11, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- weights: [0.33, 8.04] class: 1 | | | | | |--- required_car_parking_space_1 > 0.50 | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | |--- weights: [59.40, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [701.58, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- lead_time <= 6.50 | | | | | | | |--- weights: [14.19, 0.67] class: 0 | | | | | | |--- lead_time > 6.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [5.61, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | |--- weights: [21.78, 16.75] class: 0 | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | |--- weights: [18.48, 5.36] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [19.80, 2.68] class: 0 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- weights: [23.10, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.25 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- weights: [0.66, 4.02] class: 1 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- weights: [0.33, 1.34] class: 1 | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | |--- avg_price_per_room <= 157.50 | | | | | | | | | |--- no_of_children_1 <= 0.50 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- weights: [61.71, 4.69] class: 0 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- weights: [0.99, 1.34] class: 1 | | | | | | | | | |--- no_of_children_1 > 0.50 | | | | | | | | | | |--- lead_time <= 107.50 | | | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 107.50 | | | | | | | | | | | |--- weights: [4.62, 4.69] class: 1 | | | | | | | | |--- avg_price_per_room > 157.50 | | | | | | | | | |--- weights: [6.93, 5.36] class: 0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- weights: [0.66, 3.35] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | |--- weights: [38.94, 45.56] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | |--- weights: [5.61, 1.34] class: 0 | | | | |--- no_of_special_requests > 2.25 | | | | | |--- weights: [29.70, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults_1 <= 0.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- weights: [1.32, 126.63] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- lead_time <= 244.00 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [7.92, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [4.95, 1.34] class: 0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [0.00, 5.36] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- weights: [33.33, 5.36] class: 0 | | | | | | | |--- lead_time > 244.00 | | | | | | | | |--- arrival_year_2018 <= 0.50 | | | | | | | | | |--- weights: [11.22, 0.00] class: 0 | | | | | | | | |--- arrival_year_2018 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [4.95, 116.58] class: 1 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [3.30, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [20.79, 0.00] class: 0 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults_3 <= 0.50 | | | | | | |--- weights: [10.56, 458.95] class: 1 | | | | | |--- no_of_adults_3 > 0.50 | | | | | | |--- weights: [2.31, 0.00] class: 0 | | | |--- no_of_adults_1 > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | |--- weights: [1.32, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | |--- weights: [0.33, 10.72] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- weights: [20.79, 4.02] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- weights: [0.00, 6.03] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- weights: [2.97, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- weights: [0.00, 1.34] class: 1 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- lead_time <= 278.00 | | | | | | | | | | |--- avg_price_per_room <= 57.29 | | | | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | | | | | |--- avg_price_per_room > 57.29 | | | | | | | | | | | |--- weights: [63.03, 1.34] class: 0 | | | | | | | | | |--- lead_time > 278.00 | | | | | | | | | | |--- weights: [25.08, 6.70] class: 0 | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | |--- weights: [0.99, 5.36] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 35.22 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [3.63, 0.00] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- weights: [0.33, 2.01] class: 1 | | | | | |--- avg_price_per_room > 35.22 | | | | | | |--- weights: [0.33, 36.18] class: 1 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [2.64, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [0.66, 3.35] class: 1 | | | | | |--- lead_time > 159.50 | | | | | | |--- arrival_date <= 1.50 | | | | | | | |--- weights: [0.66, 1.34] class: 1 | | | | | | |--- arrival_date > 1.50 | | | | | | | |--- weights: [15.84, 0.67] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- no_of_special_requests <= 2.25 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- avg_price_per_room <= 44.12 | | | | | | | | |--- weights: [0.66, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 44.12 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- weights: [0.00, 83.75] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [2.64, 7.37] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- no_of_adults_2 <= 0.50 | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | |--- no_of_adults_2 > 0.50 | | | | | | | | |--- weights: [5.61, 0.67] class: 0 | | | | | |--- no_of_special_requests > 2.25 | | | | | | |--- weights: [3.96, 0.00] class: 0 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Offline <= 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | | |--- weights: [20.79, 1.34] class: 0 | | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | | |--- weights: [0.00, 0.67] class: 1 | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | |--- lead_time <= 233.00 | | | | | | | | | | |--- lead_time <= 152.50 | | | | | | | | | | | |--- weights: [0.66, 2.01] class: 1 | | | | | | | | | | |--- lead_time > 152.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 233.00 | | | | | | | | | | |--- no_of_children_2 <= 0.50 | | | | | | | | | | | |--- weights: [10.23, 6.70] class: 0 | | | | | | | | | | |--- no_of_children_2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | |--- weights: [3.96, 7.37] class: 1 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- weights: [0.99, 6.70] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- lead_time <= 269.00 | | | | | | | | | | |--- lead_time <= 176.00 | | | | | | | | | | | |--- weights: [0.99, 3.35] class: 1 | | | | | | | | | | |--- lead_time > 176.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 269.00 | | | | | | | | | | |--- weights: [0.00, 2.01] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- arrival_date <= 14.50 | | | | | | | |--- weights: [3.63, 1.34] class: 0 | | | | | | |--- arrival_date > 14.50 | | | | | | | |--- avg_price_per_room <= 55.92 | | | | | | | | |--- weights: [0.99, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 55.92 | | | | | | | | |--- weights: [3.96, 14.07] class: 1 | | | | |--- market_segment_type_Offline > 0.50 | | | | | |--- weights: [49.83, 3.35] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.25 | | | | |--- weights: [0.00, 1412.36] class: 1 | | | |--- no_of_special_requests > 2.25 | | | | |--- weights: [10.23, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [15.51, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [1.65, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- weights: [1.65, 10.05] class: 1
Observations from the tree
Not_CancelInterpretations from other decision rules can be made similarly.
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
estimator.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
# Here we will see that importance of features has increased
Imp lead_time 0.391994 market_segment_type_Online 0.134938 no_of_special_requests 0.124633 avg_price_per_room 0.116906 arrival_month 0.059427 arrival_date 0.033705 no_of_weekend_nights 0.027518 no_of_week_nights 0.027200 no_of_adults_1 0.020433 arrival_year_2018 0.019914 market_segment_type_Offline 0.013002 required_car_parking_space_1 0.010391 no_of_adults_2 0.006363 type_of_meal_plan_Not_Selected 0.005112 no_of_adults_3 0.001498 room_type_reserved_Room_Type 2 0.001465 room_type_reserved_Room_Type 4 0.001202 market_segment_type_Corporate 0.000843 no_of_children_1 0.000758 type_of_meal_plan_Meal_Plan_2 0.000684 room_type_reserved_Room_Type 5 0.000597 room_type_reserved_Room_Type 6 0.000369 repeated_guest_1 0.000353 room_type_reserved_Room_Type 7 0.000348 no_of_children_2 0.000345 room_type_reserved_Room_Type 3 0.000000 type_of_meal_plan_Meal_Plan_3 0.000000 no_of_children_9 0.000000 no_of_children_3 0.000000 market_segment_type_Complementary 0.000000 no_of_adults_4 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 no_of_children_10 0.000000
# plotting the important features
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
The DecisionTreeClassifier provides parameters such as
min_samples_leaf and max_depth to prevent a tree from overfiting. Cost
complexity pruning provides another option to control the size of a tree. In
DecisionTreeClassifier, this pruning technique is parameterized by the
cost complexity parameter, ccp_alpha. Greater values of ccp_alpha
increase the number of nodes pruned. Here we only show the effect of
ccp_alpha on regularizing the trees and how to choose a ccp_alpha
based on validation scores.
Minimal cost complexity pruning recursively finds the node with the "weakest
link". The weakest link is characterized by an effective alpha, where the
nodes with the smallest effective alpha are pruned first. To get an idea of
what values of ccp_alpha could be appropriate, scikit-learn provides
DecisionTreeClassifier.cost_complexity_pruning_path that returns the
effective alphas and the corresponding total leaf impurities at each step of
the pruning process. As alpha increases, more of the tree is pruned, which
increases the total impurity of its leaves.
# setting the parameters of the cost complexity pruner
clf = DecisionTreeClassifier(random_state=1, class_weight={0: 0.33, 1: 0.67})
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
# creating a dataframe for the ccp_alphas vs impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000e+00 | 0.008376 |
| 1 | 0.000000e+00 | 0.008376 |
| 2 | 1.958732e-20 | 0.008376 |
| 3 | 1.958732e-20 | 0.008376 |
| 4 | 1.958732e-20 | 0.008376 |
| ... | ... | ... |
| 1685 | 8.886770e-03 | 0.328000 |
| 1686 | 9.831603e-03 | 0.337832 |
| 1687 | 1.273174e-02 | 0.350564 |
| 1688 | 3.410153e-02 | 0.418767 |
| 1689 | 8.123211e-02 | 0.499999 |
1690 rows × 2 columns
# plotting total impurity vs effective alphas
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Next, we train a decision tree using the effective alphas. The last value
in ccp_alphas is the alpha value that prunes the whole tree,
leaving the tree, clfs[-1], with one node.
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight={0: 0.33, 1: 0.67}
)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.08123211203866026
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node. Here we show that the number of nodes and tree depth decreases as alpha
increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
recall_train = [] # creating the empty list for the values
for clf in clfs: # for loop to append the list
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
recall_test = [] # creating the empty list for the values
for clf in clfs: # for loop to append the list
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
# calculation of the test and training scores
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
Maximum value of Recall is at 0.0001 alpha, but if we choose this, the decision tree will be too complex , instead we can choose alpha 0.006 to simplify the tree still get a higher recall.
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.00011408531261741304,
class_weight={0: 0.33, 1: 0.67}, random_state=1)
best_model.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=0.00011408531261741304,
class_weight={0: 0.33, 1: 0.67}, random_state=1)
confusion_matrix_sklearn(best_model, X_train, y_train)
print("Recall Score:", get_recall_score(best_model, X_train, y_train))
Recall Score: 0.9194069113954323
confusion_matrix_sklearn(best_model, X_test, y_test)
print("Recall Score:", get_recall_score(best_model, X_test, y_test))
Recall Score: 0.862862010221465
# code to plot the decision tree
plt.figure(figsize=(5, 5))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
Creating a model with 0.006 alpha
# setting ccp_alpha = 0.006
best_model2 = DecisionTreeClassifier(
ccp_alpha=0.006, class_weight={0: 0.33, 1: 0.67}, random_state=1
)
best_model2.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=0.006, class_weight={0: 0.33, 1: 0.67},
random_state=1)
confusion_matrix_sklearn(best_model2, X_train, y_train)
# calculation of the scores
decision_tree_postpruned_perf_train = get_recall_score(best_model2, X_train, y_train)
print("Recall Score:", decision_tree_postpruned_perf_train)
Recall Score: 0.7746024154011718
# creating the confusion matrix
confusion_matrix_sklearn(best_model2, X_test, y_test)
# calculation of the scores
decision_tree_postpruned_perf_test = get_recall_score(best_model2, X_test, y_test)
print("Recall Score:", decision_tree_postpruned_perf_test)
Recall Score: 0.7731402612152186
# plotting the decision tree
plt.figure(figsize=(15, 10))
out = tree.plot_tree(
best_model2,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(best_model2, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- weights: [1251.36, 239.19] class: 0 | | | |--- lead_time > 90.50 | | | | |--- weights: [271.26, 284.08] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- weights: [358.05, 219.76] class: 0 | | | |--- lead_time > 13.50 | | | | |--- weights: [468.27, 1634.80] class: 1 | |--- no_of_special_requests > 0.50 | | |--- weights: [2819.19, 804.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- weights: [422.40, 998.97] class: 1 | |--- avg_price_per_room > 100.04 | | |--- weights: [29.04, 1422.41] class: 1
Observation
If a booking is made with a lead time of less than 90.5 days, there are no special request and it's not made online, there is a likelihood that the booking will be kept.
If a booking is made with a lead time of less than 151.5 day and special request are made at the time of booking and it's online, there is also a likelihood that the booking will be kept.
If a booking is made with a 151.5 days > lead time >13.5 days , there are no special requests and it is made online, there is a likelihood of the booking being Cancelled.
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
best_model2.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.568596 market_segment_type_Online 0.217324 no_of_special_requests 0.159728 avg_price_per_room 0.054353 no_of_weekend_nights 0.000000 room_type_reserved_Room_Type 6 0.000000 required_car_parking_space_1 0.000000 room_type_reserved_Room_Type 2 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 4 0.000000 room_type_reserved_Room_Type 5 0.000000 arrival_year_2018 0.000000 room_type_reserved_Room_Type 7 0.000000 type_of_meal_plan_Meal_Plan_3 0.000000 market_segment_type_Complementary 0.000000 market_segment_type_Corporate 0.000000 market_segment_type_Offline 0.000000 type_of_meal_plan_Not_Selected 0.000000 no_of_children_10 0.000000 type_of_meal_plan_Meal_Plan_2 0.000000 no_of_week_nights 0.000000 no_of_children_9 0.000000 no_of_children_3 0.000000 no_of_children_2 0.000000 no_of_children_1 0.000000 no_of_adults_4 0.000000 no_of_adults_3 0.000000 no_of_adults_2 0.000000 no_of_adults_1 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 arrival_date 0.000000 arrival_month 0.000000 repeated_guest_1 0.000000
importances = best_model2.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# training performance comparison
models_train_comp_df = pd.DataFrame(
[
decision_tree_perf_train,
decision_tree_tune_perf_train,
decision_tree_postpruned_perf_train,
],
columns=["Recall on training set"],
)
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Recall on training set | |
|---|---|
| 0 | 0.995097 |
| 1 | 0.904819 |
| 2 | 0.774602 |
# testing performance comparison
models_test_comp_df = pd.DataFrame(
[
decision_tree_perf_test,
decision_tree_tune_perf_test,
decision_tree_postpruned_perf_test,
],
columns=["Recall on testing set"],
)
print("Test performance comparison:")
models_test_comp_df
Test performance comparison:
| Recall on testing set | |
|---|---|
| 0 | 0.803805 |
| 1 | 0.865701 |
| 2 | 0.773140 |
data = {
"Logistic Regression Default Threshold": log_reg_model_test_perf.T.loc["Recall"][0],
"Logistic Regression-0.37 Threshold": log_reg_model_test_perf_threshold_auc_roc.T.loc[
"Recall"
][
0
],
"Logistic Regression-0.41 Threshold": log_reg_model_test_perf_threshold_curve.T.loc[
"Recall"
][0],
"Original_Pruned Tree": [decision_tree_perf_test],
"Pre-Pruned": [decision_tree_tune_perf_test],
"Post_Pruned": [decision_tree_postpruned_perf_test],
}
models_test_comp_df = pd.DataFrame(data, index=["Recall Values"])
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression Default Threshold | Logistic Regression-0.37 Threshold | Logistic Regression-0.41 Threshold | Original_Pruned Tree | Pre-Pruned | Post_Pruned | |
|---|---|---|---|---|---|---|
| Recall Values | 0.633163 | 0.737649 | 0.702726 | 0.803805 | 0.865701 | 0.77314 |
Observations
The Logistic Regression- 0.32 Threshold (Optimized ROC Curve) - gave the highest recall of the Logistic Regression Models
While the Pre-Pruned Tree gave the highest recall on the test data, the tree is too complex
The Post-Pruned Tree created a manageable tree with a Recall of 0.77 which is higher than the best Logistic Regression Model(i.e. the Logistic Regression Model with 0.32 threshold)
Based on the plot above the average room prices per market segment are
As we can see Online bookings are the most expensive. As a result, since 71.3 % of the cancelled bookings were from the Online sector, this would have a significant impact on revenue.
Based on the plot above, we can see that 32.8% of the bookings are cancelled.
Additionally, we can see from the plot below that over 71.3% of the 11885 bookings that were cancelled comes from Online. Hence the need for a strict policy w.r.t cancellation of Online bookings.
## Here we find the number of repeated guest that have canceled by applying a filter on the original data set
Cancellled_bookings = INN[INN["booking_status"] == "Canceled"]
Cancelled_repeat = Cancellled_bookings[Cancellled_bookings["repeated_guest"] == 1]
Total_repeated_guest =INN[INN["repeated_guest"]==1]
percentage = round((len(data10["repeated_guest"]) / len(Total_repeated_guest["repeated_guest"])) * 100, 2)
print("The percentage of repeated guest that canceled is :", percentage,"%")
The percentage of repeated guest that canceled is : 1.72 %
Cancellled_bookings = INN[INN["booking_status"] == "Canceled"]
labeled_barplot(Cancellled_bookings, "no_of_special_requests", perc=False)
Observations
1 special requests attached to them. 2 special requests attached to them. Decision Tree Summary
Scenario 1 - If the booking lead time is less than 5 months (151 days), in this scenario , the deciding factors for a cancellation or not cancelled are
If there are no special requests and it is not an online booking
If there are no special requests are made and it is an online booking
Scenario 2 - If the lead time is less than 5 months, special request are made, it doesn’t matter what segment the booking was made from, it is expected to be kept.
Scenario 3 - If the lead time is > 5 months whether or not the price of the room is greater or less than 100 euros, there is a likelihood that the room can be cancelled.
Regression Model Summary
Based on the regression there were some factors such as being a repeated guest, having special request, making the booking through offline or corporate and requiring parking space that decreased the odds of a booking being cancelled.
Some of these were consistent with the decision tree that indicated for example, if lead time was less than 5 months, granted special request were made, the booking was expected to be kept regardless of the method of booking.
Also with the decision tree, if a booking is made through other segments except online and the lead time is less than 3 months, the booking was expected to be kept. This is consistent with the regression model which indicated that there was a 83% and 52% decrease in odds of a booking being cancelled if the booking was made Offline or Corporate respectively. Based on the intersection between the results of the Decision Tree and the logistic regression model, the following Policy for Cancellation and Refunds seems appropriate.
Policy Summary
Cancellations
Cancellation Penalty fees will be applied in the following situations
* If a booking is made from other segments (excluding Online) with a lead time of less than 3 months and there are no special request.
* If a booking is made online with a lead time of less than 13.5 days and there are no special request
* If the lead time is less than 5 months and special request are made, regardless of if the booking was made online or not.
Penalty fees do not apply if bookings are cancelled more than 151 days (5 months in advance)
Refunds